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LINEAR  REGRESSION  TO  A  LOWER  ORDER  MODEL: 
EFFECTS  AND  IMPLICATIONS 

INTRODUCTION 


Linear  regression  is  used  extensively  in  the  fields  of  science, 
engineering,  and  business.  In  many  instances,  because  the  data-generat ing 
process  can  be  complex,  exhibit  random  effects,  or  be  unknown,  the  regression 
model  used  only  approximates  the  actual  process  model.  For  these  reasons, 
regression  models  of  reduced  order  are  often  used.  Such  models  are  designed 
to  be  as  accurate  as  necessary  for  their  intended  application  without 
retaining  unnecessary  and  noise-sensitive,  higher  order  terms. 

When  regression  is  used  for  noise  suppression,  the  order  of  the 
regression  model  required  can  be  a  function  of  the  noise  level  encountered. 
Under  high-noise  conditions,  the  errors  incurred  by  using  a  reduced-order 
regression  model  can  be  negligible  when  compared  with  the  ncise  uncertainty. 
Under  low-noise  conditions,  however,  the  errors  can  become  significant. 

When  the  regression  parameters  are  to  be  related  to  process  state 
parameters  for  subsequent  processing,  it  is  very  important  that  the 
relationship  be  properly  formed.  Use  of  a  reduced-order  regression  model  can 
produce  unexpected  biases,  which  can  be  accounted  for  in  the  state  relation  if 
interpreted  properly.  Such  a  problem  was  encountered  when  hierarchical 
processing  was  applied  to  the  contact  motion  analysis  problem.  In  this 
application,  a  bearing  sequence,  which  is  related  to  the  state  by  means  of  the 
arctangent  function,  was  characterized  by  a  second-order  regression  model. 

When  relating  the  regression  coefficients  to  the  state  parameters,  the  bias 
resulting  from  the  nonzero  derivatives  of  the  arctangent  function  proved 
significant  under  low-noise  conditions.  Application  of  the  analysis  presented 
in  this  report  allowed  for  bias  compensation  and  the  concomitant  improvement 
in  estimator  performance. 

This  report  presents  a  brief  review  of  linear  regression  and  an  analysis 
of  the  effects  of  using  a  reduced-order  regression  model.  In  addition,  a 
parallel  analysis  using  the  Householder  transformation,  which  provides  a 
convenient  computational  scheme  for  bias  compensation,  is  discussed. 
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LEAST-SQUARES  REGRESSION 


A  process  of  some  fixed  order  can  be  represented  by  a  finite  Taylor 
series  about  a  chosen  point.  If  noise-free  measurements  are  available,  the 
Taylor  series  coefficients  can  be  calculated  exactly  provided  the  number  of 
measurements  is  at  least  equal  to  the  order  of  the  process.  In  this  case,  the 
system  of  measurement  equations,  with  the  Taylor  series  coefficients  as 
unknowns,  forms  a  complete  set  of  linear  equations.  When  the  number  of 
measurements  is  greater  than  the  order  of  the  process,  the  system  is  over 
determined,  and  redundant  equations  can  be  ignored.  The  problem  becomes  more 
complex  when  measurement  noise  is  present. 

The  most  common  method  of  solving  an  overde termined  system  of  equations 
in  the  presence  of  noise  is  to  use  the  method  of  least  squares  (references  1 
and  2).  Consider  the  system  of  equations 

Ax  =  b,  ( 1 ) 

where  x  is  the  n  x  1  vector  of  unknown  coefficients,  A  is  the  m  x  n  system 
matrix  (m  >  n)  created  from  the  samples  of  the  independent  variable,  and  b  is 
the  m  x  1  measurement  vector.  The  least-squares  solution  is  derived  by  first 
introducing  the  n  x  1  error  vector 

e  =  b  -  Ax.  (2) 

The  objective  in  the  least-squares  technique  is  to  minimize  the  squared 
magnitude  of  the  error  vector,  so  that 

| | e | | 2  =  e^e  =  (b  -  Ax)T(b  -  Ax),  (3) 

giving  the  least-squares  solution 

x  =  (ATAJ-U^b  =  A#b.  (4) 

The  matrix  A*  =  (A^AJ'^A^,  which  is  the  orthogonal  projection  matrix  that 
projects  an  arbitrary  vector  into  the  subspace  spanned  by  Ax,  is  called  the 
generalized  or  pseudoinverse  of  the  matrix  A  (references  1  and  2). 

In  the  problem  at  hand,  the  rows  of  the  matrix  equation  (Ax  =  b)  are 
simply  samples  of  the  Taylor  series  representation,  which  results  in  the 
matrix  A  assuming  a  distinct  form.  To  illustrate  the  form  cf  A,  consider  the 
nch  orcjer  Taylor  series  polynomial  about  the  point  tQ  given  by 

b  ( t )  =*  XQ  +  xj  ( t  -  tQ)  +  X2  ( t  -  to)2  +  ...  +  Xn(t  -  t  o ) n ,  (5) 

where 

xo  =  b  <  t  o ) , 

\[  s  bM  tQ )/( i ) ! .  i  >  0,  (6) 

with  b'(t)  being  the  ilh  derivative  of  b(t).  With  no  loss  in  generality. 
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to  *  0  may  be  selected.  For  m  samples  of  the  function  taken  at  points 
1 1 , 1 2 . tm,  the  resulting  set  of  equations  is 

b ( t 1 )  a  XQ  ♦  x\t\  +  X2tl2  +  ...  ♦  Xntin, 

b(  1 2 )  =  xq  +  xit2  ♦  X2t22  +  ♦  xnt2n.  (7) 


b( tm)  »  xq  +  xi tB  ♦  X2tn,2  ♦  ...  ♦  xntBn. 


which  may  be  written  in  the  form  of  equation  (1)  with  matrix  A  defined  as 


and 


t 

m 


b  a  fb( 1 1 ) ,b( t2 ) . b(tm)]T. 


(8) 


An  estimate  of  the  n1^  order  Taylor  series  coefficients  x  based  on  m 
noisy  measurement  samples  yi  =  b(t[)  ♦  wj,  with  wj  being  the  measurement 
noise,  may  be  obtained  by  solving  the  ieast-squares  equation  (4)  using  the 
system  matrix  of  equation  (8)  and  b  «  [yj,y2 . ynl. 


The  matrix  A  takes  a  particularly  convenient  form  when  the  measurement 
points  t\  are  uniformly  spaced  and  (again,  with  no  loss  in  generality)  are 
symmetrical  about  the  point  tg  =  0;  hence,  tfc  =  kAt  for  k  =  (-m/2,  m/2). 
(Here,  m  is  considered  to  be  an  even  number  for  simplicity;  similar  results 
hold  for  m  odd.  )  Now,  the  matrix  A  has  the  form 


A  = 


1  <-m/2)At  [(-a/2)At]‘ 


Z(iAt)  Z('\M)4 


l  1  (♦■/2)At  [(+m/2)At]‘ 

and  A7A  is  given  by 
m+1 

E(iAt) 
l 

A  = 


l  E(iA)0’1 


[  (-m/2  )At ] 


[ ( +m/2  )At )n 


E ( i At ) 


(9) 


E(iAt) 


2n 


(10) 
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where  the  summation  is  over  i  =  -a'?,  to  m/'L  Again,  an  estimate  of  the 
coefficient  vector  x  may  be  obtained  by  solving  the  U-:ast- squares  equation 
(4).  Estimation  using  the  least* squares  criterion  is  nothing  new;  what  is  of 
interest  here  is  the  implication  of  using  a  model  order  for  estimation  that  is 
kwer  than  the  actual  process  modei  order. 


ESTIMATION  TO  A  Um R  ORDER  MODEL 


When  the  process  of  interest  is  characterized  by  a  Taylor  series,  for 
mp.ny  practical  applications,  the  higher  order  terms  in  the  series  are  of 
Siguif ;cantly  lower  magnitude  when  co..tp  i -ed  with  the  lower  order  terms.  It  is 
often  the  case  that  the  noise  level  will  result  in  estimation  errors  that  are 
comparable  vith  the  magnitude  or  these  higher  order  items.  For  this  reason, 
it  is  often  desirable  to  mxlti  the  system  with  a  lower  order,  providing,  a  more 
robust  estimation  of  the  cksired  coefficients.  Also,  in  some  instances,  it 
may  be  possible  for  the  Taylor  series  coefficients  of  the  higher  order  terms 
to  he  expressed  a:  functions  of  the  lower  order  coefficients.  Unner  such 
conditions,  the  lower  order  terms  completely  describe  the  system  dynamics , 

•iven  though  there  may  ce  nonzero,  higher  order  terms.  liei  e  again,  it  may  be 
desirable  to  estimate  the  coefficients  for  too  lowest  order  model  to 
e.’t  isfactori  ly  descrioe  the  system  dynamics. .  However,  care  must  be  taken  when 
interpreting  the  resulting  coefficient  esimstes.  because  the  basis  functions 
of  the  Tavloi  series  arc  not  orthogonal.  The  inteipretat ion  becomes 
particularly  iuoortant  when  the  coefficient  estimates  are  t>  be  compar'd  with 
predictions  from  a  stau  parameter  model  . 


To  illustrate  the  effects  of  fitting  tc  a  lower  ordei ,  consider  the 
following  process  described  b>  the  fourth-order  modes: 

y(t)  -  qo  +  q:/  *  «  q3l3  ♦  qqt4.  (11) 

Then,  consider  the  following  case  where  the  estimation  is  performed  for  a 
mode!  of  the  second  order: 

Y( t )  =  xq  ♦  x[ t  *  t (12) 

Again,  for  simplicity,  consider  a  symmetric  time  interval,  which  has  been 
uniformly  sampled.  The  impact  of  the  reduced-order  estimation  can  be 
evaluated  by  solving  the  least-squares  equations  under  noise-free  conditions. 
Under  these  conditions,  the  measurements  arc  simply  noise-free  samples  of  the 
process  function,  which  is  shown  as 

[  Rq  +  qj ( -m/2 )A t  +  . . .  +  q4( (-m/2 )At ]4  j 


b  = 


qQ  ♦  q^Ojdt 


♦  ■  . •  ♦  q4((0)dti4 


(13) 


qQ  +  q1(m/2)dt  + 


q4 [ ( m/2  )At ] 
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This  nay  be  written  as  b  «  Qq,  where  0  is  the  (m  x  5)  matrix  in  the  form  of 
equation  (9)  ami  q  is  the  vector  of  coefficients  from  equation  (11),  such  that 

q  *  [qo.qi.f-i7.q3.q4)-  (14) 

Fron  equation.'  (9)  and  (12),  it  may  be  seen  that  the  matrix  A  is  given  by 


1  (-m/2)At 

[(••m/2)At]2 

A  n 

( 0  )Ar 

[ (0 )At ]2 

(15) 

: 

1 

l  1  (m/2)At 

[ (m/2)At ]2  . 

equation  (10),  A^A  takes  the 

form 

*ta 


m+1 

0 

..  m/2  , 

2 (At )  i  ur 

Ul 


0 

2  m22  2 
2(At)Z  E  (i)Z 

i*l 

0 


,  m/2  •> 

2(dt  y  Z  (i)~ 
i  =  l 

0 


4  m<^2  4 

2 (At y  Z  (if 

i  =  l 


and  the  product  A^b  takes  the  form 
ATb  *  A^Qq, 


ATQ  = 


m+1 


m/2 

0  2(At )2  E  (1  r 


m/2 

0  2 ( A t ) 4  E  (i  )4 


i  =  l 


m/2 

0  2(A: )2  I  (i )2 


m/2 


0  2(At ) 1  E  (1 >4 


i-1 


m/2 
[  E  l 

i  =  l 


m/2 


2 ( At )4  E  (i)2 


,4 


0  2 (At)  E(i) 

i  =  l 


i  =  l 


0  2 ( A . ) 6  E  (i)6 


i  =  l 

0 

IT  '1 

’  E  ' 
i  =  l 


(16) 


(17) 


(18) 


The  sums  'p.  equations  (16)  and  (18)  may  be  replaced  by  T he i  ■■  closed-form 
equivalents  and  for  largt  it  approximated  by  the  most  significant  terra.  The 
resulting  approximations  follow,  >.e.. 
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ata 


7_3 


0 

(At )2m3 
12 


0 

(At)2m3 

12 

0 


(At )  m 
12 


0 


(At  )4m3 
80 


(19) 


and 


*To 


2  3 

(Atrnr 

12 


(ATA)'lATQ  . 


(At )2a3 
12 


0 

1 


2  3 
(At)  V 

12 


0 


4  5 
(At )  nr 

80 


0 

0 


0 


0 


4  5 
(At)  m3 

80 


4  5 

(At)  m3 

80 


0 


(At )^m2 
448 


2_2 


3(At )  m 
20 


0 


-3(At )4m4  ^ 
560 

0 

3(At )2m2 
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From  the  resulting  least-squares  solution  for  x 
x  ^  (ATA)-!AT0q, 


(20) 


(21) 


(22) 


it  may  be  seen  that  the  estimates  of  the  components  of  x  are  not  exactly  equal 
to  the  corresponding  components  of  q.  As  an  example,  look  at  the  estimate  of 
xq,  such  that 


% 


"0 


=  qn  - 


4  4' 

3 (At )  m 


560 


The  use  of  a  reduced-order  estimation  model  produces  a  bias  on  the  Taylor 
series  coefficient  estimates. 


(23) 


To  evaluate  whether  this  biasing  effect  is  significant,  one  must  look  at 
its  magnitude  relative  to  the  estimated  noise  variance.  From  the  Cramer-Rao 
inequality,  it  may  be  seen  that  the  covariance  of  an  unbiased  estimator  is 
bounded  below  by  the  inverse  of  the  Fisher  information  matrix  (reference  2). 
While  xq  is  a  biased  estimate  of  qg,  it  is  an  unbiased  estimate  of  the 
parameter  for  which  the  right-hand  side  of  equation  (23)  is  an  approximation. 
This  is  also  the  case  for  the  other  elements  of  the  vector  x.  The  inverse  of 
the  Fisher  information  matrix,  aw2(ATA)-l  for  homoskedas t ic  noise  with 
variance  ow^  (reference  2),  is  a  lower  bound  to  the  covariance  on  x,  so  that 
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0 


_9 

4m 


-IS 

(At)2m3 


(ATA)- 


0 


12 


(At)2m3 


0 


-IS 
2  3 

L  (At  )V 


0 


180 

4  5 

(At)V 


(24) 


It  may  be  seen  that  <7*0^  >  (9/4 «)ow^  and,  from  equation  (23),  the  bias  error 
due  to  the  reduced-order  fit  is 


eb 


4  4' 
3 (At )  m  q4 

.  560 


(25) 


The  bias  effect  becomes  significant  when  the  corresponding  standard  deviation 
is  on  the  order  of  the  bias  magnitude 


,.,,4  9/2 
(At)  m  q4 

280  ' 


This  gives  the  noise  level  or,  conversely,  the  magn" tude  of  the  process 
parameter  q4,  for  which  the  biasing  effect  of  using  a  iov.r  order  model 
becomes  significant. 


(26) 


To  illustrate  that  the  use  of  a  reduced-order  modei  may  be  advantageous 
from  a  minimum  variance  perspective,  look  at  the  Cramer-Rao  bound  for  a 
first-order  model,  when 

(1/m)  0  1 


2  1  1 

(At)V  j 

It  may  be  seen  that  the  variance  on  the  xq  parameter  for  a  first-order  model 
is  four-ninths  of  the  corresponding  second-order  model  estimate.  This 
illustrates  the  fact  that,  while  using  a  reduced-order  model  may  introduce  a 
bias,  it  can  provide  an  estimate  with  significantly  lower  variance.  In 
situations  where  the  bias  can  be  accounted  for,  or  under  relatively  high-noise 
conditions,  it  is  most  dvantageous  to  use  a  model  of  minimal  order. 


2..T..-1  2 

°w  (A  A)  =  aw 


BIAS  COMPENSATION  VIA  THE  HOUSEHOLDER  TRANSFORMATION 


Use  of  the  pseudoinverse  for  the  solution  to  the  normal  equations  is 
convenient  for  the  derivation  of  an  analytic  expression  for  noise-free 
regression  coefficients.  However,  the  analysis  uses  the  assumption  of  a 
constant  data  rate  and  is  further  simplified  by  using  approximate  expressions 
for  the  suras  involved.  For  large  or  poorly  conditioned  systems,  the  matrix 
inversion  required  for  the  pseudoinverse  solution  can  be  computationally 
intensive  and  problematical  (reference  2).  To  avoid  these  difficulties. 
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numerical  techniques  have  been  developed  that  eliminate  the  need  for  an 
explicit  matrix  inversion.  One  technique  commonly  employed  is  the  Householder 
transformation  (references  2  and  3). 

The  Householder  tranformat ion  is  a  numerical  technique  used  to  effect  an 
orthogonal  transformation  on  the  regression  model.  Without  delving  into 
details  of  the  Householder  procedure,  the  results  of  interest  may  be  obtained 
by  looking  at  the  transformation  matrix  to  which  it  effectively  applies.  The 
Householder  matrix  H  transforms  the  regression  model  into  the  upper  triangular 
form 


(28) 


where  Sx  is  an  n  x  n  upper  triangular  matrix,  Sx  is  an  n  x  1  vector,  and  £e  is 
an  (m  -  n)  x  1  vector.  Using  the  orthogonal  property  of  the  Householder  matrix 
(H^H  a  I ) .  the  least-squares  criterion  may  be  rewritten  as 


||e||2  s  e^e  =  (b  -  Ax)"Hb  -  Ax), 
=  (b  -  Ax)THTH(b  -  Ax), 

«  (Hb  -  HAx)^(Hb  -  HAx), 


rs 

rs  i 

T 

rs 

rs  1 

X 

- 

X 

X 

X 

— 

X 

«e. 

0 

«e. 

0 

l|e||2  =  (5x  -  Sx*)T(Sx  -  Sxx)  ♦  CeT^e-  (30) 


The  second  term  in  equation  (3C)  is  independent  of  x;  hence,  the  least-squares 
solution  is  found  by  choosing  the  x  that  causes  the  first  term  to  vanish,  i.e., 

x  =  Sx-Ux.  (31) 


Because  Sx  is  an  upper  triangular  matrix,  the  inversion  involved  in  equation 
(31)  is  straightforward. 


Agair.,  the  order  of  the  process  model  is  considered  to  be  greater  than 
the  regression  model.  Under  noise-free  conditions,  the  measurement  vector  is 
generated  by  b  *  Qq.  where  the  matrix  0  is  of  the  form  of  equation  (8),  and  it 
can  be  noted  that 


0  =  [A  :  A 1 ,  (32) 

where  A  is  the  matrix  for  the  lower  order  regression  model  and  \  contains  the 
higher  order  components.  Applying  the  Householder  transformation  to  the 
resulting  regression  model  makes  the  squared  error 
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A  A 

q  - 

A 

0  5c. 

0 

0  Se. 

0 

*  U*x  *  Sxlq  _  Sxx)T  ( [Sx  •  Sx]q  -  Sxx) 


+  ([0  :  Se]q)T  ((0  :  Se]q).  (33) 

Again,  the  second  term  is  independent  of  x  and  represents  the  magnitude  of  the 
squared  error  resulting  from  the  use  of  a  reduced-order  regression  model.  The 
least-squares  solution  is  obtained  by  choosing  x  so  that  the  first  term 
vanishes,  i.e., 

x  «  Sx-l[Sx  :  Sx)q, 

=  [I  :  Sx-lSx]q.  (34) 

The  mafrix  in  equation  (34)  is  a  general  form  of  that  found  in  equation  (22), 
with  none  of  the  approximations  or  assumptions  of  constant  data  rate  or 
symmetric  interval.  It  is  readily  produced  as  a  byproduct  of  the  Householder 
procedure  and  is  not  a  function  of  the  measurement  noise. 


APPLICATION  TO  HIERARCHICAL  ESTIMATION 


The  results  of  the  previous  sections  become  particularly  relevant  when 
the  reduced-order  regression  is  performed  as  the  first  stage  in  a  hierarchical 
estimation  procedure.  As  such,  locally  estimated  regression  parameters  serve 
as  "pseudomeasurements"  for  a  second  stage  of  estimation,  which  combines  the 
local  parameters  from  multiple  data  segments  to  provide  a  global  state 
estimate.  As  stated  earlier,  it  may  be  advantageous  to  perform  a  minimal- 
order  local  estimation  to  minimize  noise  effects.  When  these  local  estimates 
are  used  as  pseudomeasurements,  the  global  measurement  model  takes  the  form  of 
equation  (34),  i.e., 

b(0)  =  [I  :  Sx-1Sx]q(e),  (35) 

where  b ( © )  is  the  predicted  pseudomeasurement  that  corresponds  to  the  solution 
x  provided  by  the  local  regression.  The  vector  q(9)  comprises  the  Taylor 
series  coefficients  based  on  the  global  state  estimate  0.  Use  of  the 
measurement  model  of  equation  (35)  limits  the  modeling  bias  errors  to  a  value 
determined  by  the  order  to  which  q(9)  is  extended,  while  retaining  the  noise 
characteristics  associated  with  the  regression  to  the  order  of  b(9). 

These  developments  were  applied  to  the  nonlinear  state  estimation  problem 
using  data  segmentation  and  compression  (reference  4).  Here,  bearing  data, 
which  are  related  to  the  state  through  the  arctangent  function,  are  character¬ 
ized  on  a  segment  by  a  second-order  polynomial.  For  each  segment,  the  result¬ 
ing  estimates  are  used  as  th;  input  to  a  second  stage  of  processing  that 
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performs  the  global  nonlinear  state  estimation.  For  the  problem  at  hand,  only 
bearing  and  its  first  two  derivatives  tB.fi, 0)  are  independent,  and  all  higher 
time  derivatives  of  bearing  can  be  written  in  terms  of  &,B,|3.  The  use  of  a 
second-order  model  to  characterize  the  bearing  curve  results  in  biased 
estimates  of  the  Taylor  series  coefficients  due  to  the  nonzero  higher 
derivatives.  However,  when  calculating  the  predicted  polynomial  coefficients 
b(0)  from  the  current  state  estimate  6,  it  is  possible  to  use  the  correction 
terms  of  equation  (35)  by  simply  using  the  predicted  value  of  q(0).  That  is, 
the  components  of  q  through  q4  are  retained  (corresponding  to  bearing  time 
derivatives  up  to  This  results  in  a  bias  compensation  that  is  accurate 

to  the  fourth  order,  while  the  regression  is  carried  to  only  a  second-order 
model  and,  hence,  has  lower  variance  than  a  fourth-order  regression.  The 
resulting  estimation  algorithm  performs  well  under  both  high-noise  and 
low-noise  conditions. 


SUMMARY 


Because  the  actual  data-gather ing  process  can  be  complex,  exhibit  random 
effects,  or  be  unknown,  it  is  often  useful  to  implement  a  minimal-order  model 
when  estimating  process  parameters.  In  cases  where  robustness  and  accuracy 
are  of  interest  and  the  estimation  algorithm  must  operate  under  both 
high-noise  and  low-noise  conditions,  it  may  be  necessary  to  account  for  the 
biasing  effects  of  higher  order  terms  on  the  estimates  of  lower  order 
parameters.  While  the  biasing  can  be  unnoticeable  under  high-noise  conditions 
or  for  short  observation  intervals,  it  can  account  for  a  significant 
percentage  of  the  errors  under  low  noise  or  for  long  observation  intervals. 
This  circumstance  has  been  illustrated  for  the  case  of  a  linear-regression 
model  in  additive  Gaussian  noise.  The  resulting  analysis  proved  useful  in 
bias  compensation  for  a  hierarchical  estimation  technique. 
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